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Abstract 



We study the statistical properties of volatility — a measure of how much 
the market is likely to fluctuate. We estimate the volatility by the local 
average of the absolute price changes. We analyze (a) the S&P 500 stock index 
for the 13-year period Jan 1984 to Dec 1996 and (b) the market capitalizations 
of the largest 500 companies registered in the Trades and Quotes data base, 
documenting all trades for all the securities listed in the three major stock 
exchanges in the US for the 2-year period Jan 1994 to Dec 1995. For the 
S&P 500 index, the probability density function of the volatility can be fit 
with a log-normal form in the center. However, the asymptotic behavior is 
better described by a power-law distribution characterized by an exponent 
1 + [i ~ 4. For individual companies, we find a similar power law asymptotic 
behavior of the probability distribution of volatility with exponent 1 + [/, ~ 
4. In addition, we find that the volatility distribution scales for a range of 
time intervals. Further, we study the correlation function of the volatility 
and find power law decay with long persistence for the S&P 500 index and 
the individual companies with a crossover at approximately 1.5 days. To 



quantify the power-law correlations, we apply power spectrum analysis and 
a recently-developed modified root-mean-square analysis, termed detrended 
fluctuation analysis (DFA). For the S&P 500 stock index, DFA estimates for 
the exponents characterizing the power law correlations are a\ = 0.66 for 
short time scales (within rs 1.5 days) and o?2 = 0.93 for longer time scales 
(up to a year). For individual companies, we find a\ = 0.60 and ai = 0.74, 
respectively. The power spectrum gives consistent estimates of the two power- 
law exponents. 
PACS numbers: 89.90.+n 
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I. INTRODUCTION 



Physicists are increasingly interested in economic time series analysis for several reasons, 
among which are the following: (i) Economic time series, such as stock market indices or 
currency exchange rates, depend on the evolution of a large number of interacting systems, 
and so is an example of complex evolving systems widely studied in physics, (ii) The 
recent availability of large amounts of data allows the study of economic time series with 
a high accuracy on a wide range of time scales varying from « 1 minute up to ~ 1 year. 
Consequently, a large number of methods developed in statistical physics have been applied 
to characterize the time evolution of stock prices and foreign exchange rates ||lHT^] • 

Previous studies [|IH24}1 show that the stochastic process underlying price changes is 
characterized by several features. The distribution of price changes has pronounced 
tails HH-plllOHI^I in contrast to a Gaussian distribution. The autocorrelation function of 
price changes decays exponentially with a characteristic time of approximately 4 min. How- 
ever, recent studies fI~4}-|2T| show that the amplitude of price changes, measured by the 
absolute value or the square, shows power law correlations with long-range persistence up to 
several months. These long-range dependencies are better modeled by defining a "subsidiary 



process" |M|-4Ty]], often referred to as the volatility in economic literature. The volatility of 
stock price changes is a measure of how much the market is liable to fluctuate. The first 
step is to construct an estimator for the volatility. Here, we estimate the volatility as the 
local average of the absolute price changes. 

Understanding the statistical properties of the volatility also has important practical 
implications. Volatility is of interest to traders because it quantifies the risk [l[ and is the 
key input of virtually all option pricing models, including the classic Black and Scholes 
model and the Cox, Ross, and Rubinstein binomial models that are based on estimates 



of the asset's volatility over the remaining life of the option p5p6| . Without an efficient 
volatility estimate, it would be difficult for traders to identify situations in which options 
appear to be under-priced or overpriced. 
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We focus on two basic statistical properties of the volatility — the probability distribution 
function and the two-point autocorrelation function. The paper is organized as follows. In 
Section 2, we briefly describe the databases used in this study, the S&P 500 stock index and 
individual company stock prices. In Section 3, we discuss the quantification of volatility. In 
Section 4, the probability distribution function is studied, and in Section 5, the volatility 
correlations are studied. The appendix briefly describes a recently-developed method, called 
detrended fluctuation analysis (DFA) that we use to quantify power-law correlations. 

II. DATA ANALYZED 

A. S&P 500 stock index 

The S&P 500 index from the New York Stock Exchange (NYSE) consists of 500 compa- 
nies chosen for their market size, liquidity, and industry group representation in the US. It 
is a market- value weighted index, i.e., each stock is weighted proportional to its stock price 
times number of shares outstanding. The S&P 500 index is one of the most widely used 
benchmarks of U.S. equity performance. We analyze the S&P 500 historical data, for the 
13-year period Jan 1984 to Dec 1996 (Fig. |T](a)) with a recording frequency of 15 seconds 
intervals. The total number of data points in this 13-year period exceed 4.5 million, and 
allows for a detailed statistical analysis. 

B. Individual company stocks 

We also analyze the Trades and Quotes (TAQ) database which documents every trade 
for all the securities listed in the three major US stock markets — the New York Stock 
Exchange (NYSE), the American Stock Exchange (AMEX), and the National Association of 
Securities Dealers Automated Quotation (NASDAQ) — for the 2-year period from Jan. 1994 
to Dec. 1995 [P7[| . We study the market capitalizations PS| for the 500 largest companies, 
ranked according to the market capitalization on Jan. 1 1994. The S&P500 index at anytime 
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is approximately the sum of market capitalizations of these 500 companies [29|. The total 
number of data points analyzed exceed 20 million. 



III. QUANTIFYING VOLATILITY 

The term volatility represents a generic measure of the magnitude of market fluctuations. 
Thus, many different quantitative definitions of volatility are use in the literature. In this 
study, we focus on one of these measures by estimating the volatility as the local average of 
absolute price changes over a suitable time interval T, which is an adjustable parameter of 
our estimate. 

Fig. 0(a) shows the S&P 500 index Z(t) from 1984 to 1996 in semi-log scale. We define 
the price change G(t) as the change in the logarithm of the index, 

G(t) es In Zit + At) - In Zit) - ^ + f { ~ ^ , (1) 

where At is the sampling time interval. In the limit of small changes in Z(t) is approximately 
the relative change, defined by the second equality. We only count time during opening hours 
of the stock market, and remove the nights, weekends and holidays from the data set, i.e., 
the closing and the next opening of the market is considered to be continuous. 

The absolute value of G(t) describes the amplitude of the fluctuation, as shown in 
Fig. |l](b). In comparison to Fig. 0(a), Fig. |l](b) does not show visible global trends due 
to the logarithmic difference. The large values of |G(£)| correspond to the crashes and big 
rallies. 

We define the volatility as the average of |G(t)| over a time window T = n ■ At, i.e., 

i t+n-l 

Vr(t) = - £ \G(t')\, (2) 
n t'=t 

where n is an integer. The above definition can be generalized by replacing with 
|G(t)| 7 , where 7 > 1 gives more weight to the large values of |G(t)| and < 7 < 1 weights 
the small values of |Cr(£)|. 
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There are two parameters in this definition of volatility, At and n. The parameter At is 
the sampling time interval for the data and the parameter n is the moving average window 
size. Note that the definition of the volatility has an intrinsic error associated with it. In 
principle, the larger the choice of time interval T, the more accurate the volatility estimation. 
However, a large value of T also implies poor resolution in time. 

Fig. ^| shows the calculated volatility Vr{t) for a large averaging window T = 8190 min 
(about 1 month) with At = 30 min. The volatility fluctuates strongly during the crash of 
'87. We also note that periods of high volatility are not sparse but tend to "cluster" . This 
clustering is especially marked around the '87 crash. The oscillatory patterns before the 
crash could be possible precursors (possibly related to the oscillatory patterns postulated 
in Clustering also occurs in other periods, e.g. in the second half of '90. There are 

also extended periods where the volatility remains at a rather low level, e.g. the years of '85 
and '93. 



IV. VOLATILITY DISTRIBUTION 
A. Volatility distribution of the S&P 500 index 

1. Center part of the distribution 

Fig. |3](a) shows the probability density function P(Vr) of the volatility for several values 
of T with At = 30 min. The central part shows a quadratic behavior on a log-log scale 
(Fig. |^(a)), consistent with a log- normal distribution pt| . To test this possibility, the 
appropriately-scaled distribution of the volatility is plotted on a log- log plot (Fig. 0(b)). The 
distributions of volatility Vr, for various choices of T (from T = 120 min up to T = 900 min), 
collapse onto one curve and are well fit in the center by a quadratic function on a log-log 



scale. Since the central limit theorem holds also for correlated series |30 |, with a slower 
convergence than for non-correlated processes |p], pTl , [18| , in the limit of large values of T, 
one expects that P(Vr) becomes Gaussian. However, a log-normal distribution fits the 
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data better than a Gaussian, as is evident in Fig. [| which compares the best log-normal 



fit with the best Gaussian fit for the data | 20[j . The apparent scaling behavior of volatility 



distribution could be attributed to the long persistence of its autocorrelation function [TJ[ 
(Section 5). 

2. Tail of the distribution 

Although the log-normal seems to describe well the center part of the volatility distri- 
bution, Fig. ||(a) suggests that the distribution of the volatility has quite different behavior 
in the tail. Since our time window T for estimating volatility is quite large, it is difficult to 
obtain significant statistics for the tail. Recent studies of the distribution for price changes 
report power law asymptotic behavior [|I],|14],[24]] . Since the volatility is the local average 



of the absolute price changes, it is possible that a similar power law asymptotic behavior 
might characterize the distribution of the volatility. Hence we reduce the time window T 
and focus on the "tail" of the volatility. 

We compute the cumulative distribution of the volatility. Eq. (0) for different time scales, 
Fig. |5](a). We find that the cumulative distribution of the volatility is consistent with a power 
law asymptotic behavior, 

P(V T >x)~^- (3) 

Regression fits yield estimates fi = 3.10 ± 0.08 for T = 32 min with At = 1 min, well outside 
the stable Levy range < /i < 2. 

For larger time scales the asymptotic behavior is difficult to estimate because of poor 
statistics at the tails. In view of the power law asymptotic behavior for the volatility 
distribution, the drop-off of P(Vr) for low values of the volatility could be regarded as a 
truncation to the power law behavior, as opposed to a log-normal. 
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B. Volatility distribution for individual companies 



In this section, we extend the investigation of the nature of this distribution to the 
individual companies comprising the S&P 500, where the amount of data is much larger, 
which allows for better sampling of the tails. 

From the TAQ data base, we analyze 500 time series Si(t), where Si is the market 
capitalization of company % (i.e., the stock price multiplied with the number of outstanding 
shares), i — 1, . . . , 500 is the rank in descending order of the company according to its market 
capitalization on 1 Jan. 1994 and the sampling time is 5 min.The basic quantity studied for 
individual stocks is the change in logarithm of the market capitalization for each company, 

G t (t) = \nSi(t + At) - ]nSi(t) - Si{t + ^ ~ Si{t) , (4) 

where the Si denotes the market capitalization of stock i = 1, . . . , 500 and At = 5min. 

As before, we estimate the volatility at a given time by averaging over a time 

window T = n ■ At, 

1 t+n—l 

V*=V*(t) = ~ £ \Gi(f)\. (5) 
n t'=t 

A normalized volatility is then defined for each company, 

4 EE ^(t) EE _J2 = , (6) 

where (...) denotes the time average estimated by non-overlapping windows for different 
time scales T. 

Fig. [9(a) shows the cumulative probability distribution of the normalized volatility v % T 
for all 500 companies with different averaging windows T, where the sampling interval At = 
5 min. We observe a power law behavior, 

p(4 > x) ~ -1 , (7) 

Regression fits yield \i = 3.10 ± 0.11 for T = 10 min. This behavior is confirmed by the 
probability density function shown in Fig. |6|(b), 
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PM ~ — • (8) 

with a cutoff at small values of the volatility. Regression fits yield the estimate 1 + fi = 
4.06 ± 0.10 for T = 10 min, in good agreement with the estimate of \x from the cumulative 
distribution. Both the probability density and the cumulative distribution, Figs. [7| and || 
show that the volatility distribution for individual companies are consistent with power-law 
asymptotic exponent \i ~ 3, in agreement with the asymptotic behavior of the volatility 
distribution for the S&P 500 index. 

In summary, the asymptotic behavior of the cumulative volatility distribution is well 
described by a power law behavior with exponent fi « 3 for the S&P 500 index. This 
power law behavior also holds for individual companies with similar exponent \i ~ 3 for the 
cumulative distribution, with a drop-off at low values. 



V. CORRELATIONS IN THE VOLATILITY 

A. Volatility correlations for S&P 500 stock index 

Unlike price changes that are correlated only on very short time scales |]31||(a few minutes), 
the absolute values of price changes show long-range power-law correlations on time scales 
up to a year or more [jl4|-|2j| . Previous works have shown that understanding the power-law 
correlations, specifically the values of the exponents, can be helpful for guiding the selection 
of models and mechanisms ||22|| . Therefore, in this part we focus on the quantification of 



power-law correlations of the volatility. To quantify the correlations, we use instead 
of Vr(t), i.e. time window T is set to 1 min with At = 1 min for the best resolution. 



1. Intra- day pattern removal 

It is known that there exist intra-day patterns of market activity in the NYSE and the 
S&P 500 index p2|- |3~4"| , |3T)l . A possible explanation is that information gathers during the 
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time of closure and hence traders are active near the opening hours. And many liquidity 
traders are active near the closing hours [0. We find a similar intra-day pattern in the 
absolute price changes (Fig. [7]). In order to quantify the correlations in absolute price 

changes, it is important to remove this trend, lest there might be spurious correlations. The 
intra-day pattern A(t day ), where t day denotes the time in a day, is defined as the average of 
the absolute price change at time t day of the day for all days. 

\ Sjli \G j (t day )\ 

A(t day ) = , (9) 

where the index j runs over all the trading days N in the 13-year period (N = 3309 in our 
study) and t day denotes the time in the day. In order to avoid the artificial correlation caused 
by this daily oscillation, we remove the intra-day pattern from G(t) which we schematically 
write as: 

g(t) = G{t day )/A{t day ) , (10) 

for all days. Each data point g(t), denotes the normalized absolute price change at time t, 
which is computed by dividing each point G(t day ) at time t day of the day by A{t day ) for all 
days. 

Three methods — correlation function, power spectrum and detrended fluctuation analysis 
(DFA) — are employed to quantify the correlation of the volatility. The pros and cons of 
each method and the relations between them are described in the Appendix. 

2. Correlation quantification 

Fig. |8](a) shows the autocorrelation function of the normalized price changes, g{t), which 
shows exponential decay with a characteristic time of the order of 4 min. However, we find 
that the autocorrelation function of \g(t) \ has power law decay, with long persistence up to 
several months, Fig. ^|(b). This result is consistent with previous studies on several economic 
time series |fT4} 4TS| , pT1 . 
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More accurate results are obtained by the power spectrum (Fig. |9|(a)), which shows that 
the data fit not one but rather two separate power laws: for f > f x , S(f) ~ f^ 1 , while for 
/</x,5(/)~/-^, where 

fa = 0.31 ±0.02 f>f x (11) 

fa = 0.90 ± 0.04 / < f x (12) 

and 

fx = min -1 , (13) 

where f x is the crossover frequency. 

The DFA method confirms our power spectrum results (Fig. ||(a)). From the behavior of 
the power spectrum, we expect that the DFA method will also predict two distinct regions 
of power law behavior, F(t) ~ t ai for t < t x with exponent a.\ = 0.66 and F(t) ~ t a2 for 
t > t x with «2 = 0.95, where the constant time scale t x = l// x , where we have used the 
relation PD| , 

a = (l + P)/2. (14) 

Fig. ^](b) shows the results of the DFA analysis. We observe two power law regions, charac- 
terized by exponents, 

ai= 0.66 ±0.01 t<t x (15) 
a 2 = 0.93 ±0.02 t > t x (16) 

in good agreement with the estimates of the exponents from the power spectrum. The 
crossover time is close to the result obtained from the power spectrum, with 

t x w l/f x w 600 min (17) 

or approximately 1.5 trading days. 
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B. Volatility correlations for individual companies 



The observed correlations in the price changes and the absolute price changes for the 
S&P 500 index raises the question if similar correlations are present for individual companies 



which comprise the S&P 500 index [29 



In the absolute price changes of the individual companies, there is also a strongly marked 
intra-day pattern, similar to that of the S&P 500 index. We compute the intra-day pattern 
for single companies in the same sense as before, 

A(t day ) es gikMlwi , (18) 

where time tday refers to the time in the day, the index i denotes companies, and the index 
j runs over all days — 504 days. In Fig. |7| we show the intra-day pattern, averaged over all 
the 500 companies and contrast it with that of the S&P 500 stock index. 

In order to avoid the intra-day pattern in our quantification of the correlations, we define 
a normalized price change for each company, 

9i {t) = Giit^/Aiitday) . (19) 

The average autocorrelation function of gi(t), i = 1,2, . . .500, shows weak correlations 
up to lOmin, after which there is no statistically significant correlation. The average au- 
tocorrelation function for the absolute price changes shows long persistence. We quantify 
the long-range correlations by two methods — power spectrum and DFA. In Fig. [10|(a) , we 
show the power spectral density for the absolute price changes for individual companies and 
contrast it with the S&P 500 index for the same 2-year period. We also observe a similar 
crossover phenomena as that observed for the S&P 500 index. The exponents of the two 
observed power laws are, 

A = 0.20 ± 0.02 / > f x (20) 
fo = 0.50 ±0.05 f<f x (21) 

where the crossover frequency is 
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/ x = — mm" 1 . (22) 
J 700 y ' 

In Fig. 0(b), we confirm the power spectrum results by the DFA method. We observe 
two power law regimes with 

ai = 0.60 ±0.01 t<t x (23) 

a 2 = 0.74 ±0.03 t > t x (24) 

with a crossover 

t x « l/f x « 700 min . (25) 

The exponents characterizing the correlations in the absolute price changes for individual 
companies are on average smaller than what is observed for the S&P 500 price changes. 
This might be due to the cross-dependencies between price changes of different companies. 
A systematic study of the cross-correlations and dependencies will be the subject of future 
work 



C. Additional remarks on power-law volatility correlations 

Even though several different methods give consistent results, the power-law correlation 
of the volatility needs to be tested. It is known that the power-law correlation could be 
caused by some artifacts, e.g. anomaly of the data or the peculiar shape of the distribution 
etc. 



1. Data shuffling 

Since we find the volatility to be power-law distributed at the tail, to test that the 
power-law correlation is not a spurious artifact of the long-tailed probability distribution, 
we shuffled each point of the \g(t)\ randomly for the S&P500 data. The shuffling operation 
keeps the distribution of \g(t)\ unchanged, but destroys the correlations in the time series 
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totally if there are any. DFA measurement of this randomly shuffled data does not show any 
correlations and gives exponent a = 0.50 (Fig. |9|) — confirming that the observed long-range 
correlation is not due to the heavy-tailed distribution of the volatility. 

2. Outliers removal 

As an additional test, we study how the outliers (big events) of the time series \g(t)\ 
affect the observed power-law correlation. We removed the largest 5% and 10% events of 
the \g(t)\ series and applied the DFA method to them respectively, the results are shown 
in Fig. |TT]. Removing the outliers does not change the power-law correlations for the short 
time scale. However, that the outliers do have an effect on the long time scale correlations, 
the crossover time is also affected. 

3. Subregion correlation 

The long range correlation and the crossover behavior observed for the S&P500 index 
are for the entire 13-year period. Next, we study whether the exponents characterizing the 
power-law correlation are stable, i.e. does it still hold for periods smaller than 13 years. 
We choose a sliding window (with size 1 y) and calculate both exponents ot\ and «2 within 
this window as the window is dragged down the data set with one month steps. We find 
(Fig. |T3(b)) that the value of a\ is very "stable" (independent of the position of the window), 
fluctuating slightly around the mean value 2/3. However, the variation of a 2 is much greater, 
showing sudden jumps when very volatile periods enter or leave the time window. Note that 
the error in estimating «2 is also large. 

VI. CONCLUSION 

In this study, we find that the probability density function of the volatility for the S&P 
500 index seems to be well fit by a log normal distribution in the center part. However, the 
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tail of the distribution is better described by a power law, with exponent 1 + fi « 4, well 
outside the stable Levy range. The power law distribution at the tail is confirmed by the 
study of the volatility distribution of individual companies, for which we find approximately 
the same exponent. We also find that the distribution of the volatility scales for a range of 
time intervals. 

We use the Detrended fluctuation analysis and the power spectrum to quantify correla- 
tions in the volatility of the S&P 500 index and individual company stocks. We find that 
the volatility is long-range correlated. Both the power spectrum and the DFA methods 
show two regions characterized by different power law behaviors with a cross-over at ap- 
proximately 1.5 days. Moreover, the correlations show power-law decay, often observed in 
numerous phenomena that have a self-similar or "fractal" origin. The scaling property of 
the volatility distribution, its power-law asymptotic behavior, and the long-range volatility 
correlations suggest that volatility correlations might be one possible explanation for the 
observed scaling behavior JHJ for the distribution of price changes p8| . 
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APPENDIX A: METHODS TO CALCULATE CORRELATIONS 

A. Correlation function 

The direct method to study the correlation property is the autocorrelation function, 

_ (gfo)gfa + t)) - (g(i )) 2 
C{t) = to 2 (*o)} " to«o)} 2 ' (26) 
where t is the time lag. Potential difficulties of the correlation function estimation are the 

following: (i) The correlation function assumes stationarity of the time series. This criterion 
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is not usually satisfied by real-world data, (u) The correlation function is sensitive to the 
true average value, (g(t )), of the time series, which is difficult to calculate reliably in many 
cases. Thus the correlation function can sometimes provide only qualitative estimation pO . 



B. Power spectrum 



A second widely used method for calculating correlation properties is the power spectrum 
analysis. Note that the power spectrum analysis can only be applied to linear and stationary 
(or strictly periodic) time series. 



The third method we use to quantify the correlation properties is called detrended fluc- 



measuring the self-similar feature can indirectly tell us information about the correlation 
properties. The advantages of DFA over conventional methods (e.g. spectral analysis and 
Hurst analysis) are that it permits the detection of long-range correlations embedded in a 
non-stationary time series, and also avoids the spurious detection of apparent long-range 
correlations that are an artifact of non-stationarities. This method has been validated on 
control time series that consist of long-range correlations with the superposition of a non- 
stationary external trend |38| . The DFA method has also been successfully applied to detect 
long-range correlations in highly complex heart beat time series [|39| , [40|| , and other physio- 
logical signals [[4T|j . 

A description of the DFA algorithm in the context of heart beat analysis appears else- 
where p8| , |39| . For our problem, we first integrate \g{i) \ time series (with N total data points, 



C. Detrended fluctuation analysis 



tuation analysis (DFA) p8| , |39| . The DFA method is based on the idea that a correlated 



time series can be mapped to a self-similar process by integration p0| , p8| , p9[ . Therefore, 



Fig- 0), 



t' 




(27) 



i=i 
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Next the integrated time series is divided into boxes of equal length t. In each box, a least 
squares line is fit to the data (representing the trend in that box). The y coordinate of the 
straight line segments is denoted by yt{t')- Next we de-trend the integrated time series, y(t'), 
by subtracting the local trend, yt{t'), in each box. The root-mean-square fluctuation of this 
integrated and detrended time series is calculated 



F(t) 



1 N 



^Y,W)-yt(tW- (28) 



This computation is repeated over all time scales (box sizes) to provide a relationship 
between F(t), the average fluctuation, and the box size t. In our case, the box size t ranged 
from lOmin to 10 5 min (the upper bound of t is determined by the actual data length). 



Typically, F(t) will increase with box size t (Fig. |l3](b)). A linear relationship on a double 
log graph indicates the presence of power law scaling. Under such conditions, the fluctuations 
can be characterized by a scaling exponent a, the slope of the line relating log F(t) to logt 



Fig. [11(b)) 



In summary, we have the following relationships between above three methods: 

• For white noise, where the value at one instant is completely uncorrelated with any 
previous values, the integrated value, y{t'), corresponds to a random walk and therefore 



a = 0.5, as expected from the central limit theorem [f42H44|l. The autocorrelation 
function, C(t), is for any t (time-lag) not equal to 0. The power spectrum is flat in 
this case. 

Many natural phenomena are characterized by short-term correlations with a charac- 
teristic time scale, r, and an autocorrelation function, C (t) that decays exponentially, 
[i.e., C(t) ~ exp(— t/r)\. The initial slope of logF(t) vs. logt may be different from 
0.5, nonetheless the asymptotic behavior for large window sizes t with a = 0.5 would 
be unchanged from the purely random case. The power spectrum in this case will 
show a crossover from 1/f 2 at high frequencies to a constant value (white) at low 
frequencies. 
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An a greater than 0.5 and less than or equal to 1.0 indicates persistent long-range 
power-law correlations, i.e., C(t) ~ t~ 7 . The relation between a and 7 is 

7 = 2 - 2a . (29) 

Note also that the power spectrum, S(f), of the original signal is also of a power-law 
form, i.e., S(f) ~ I//' 3 - Because the power spectrum density is simply the Fourier 
transform of the autocorrelation function, (3 = 1 — 7 = 2a — 1. The case of a = 1 is a 
special one which has long interested physicists and biologists — it corresponds to 1/f 
noise (/3 = 1). 

When < a < 0.5, power-law anti- correlations are present such that large values are 



more likely to be followed by small values and vice versa [30 



• When a > 1, correlations exist but cease to be of a power-law form. 

The a exponent can also be viewed as an indicator of the "roughness" of the original 
time series: the larger the value of a, the smoother the time series. In this context, 1/f noise 
can be interpreted as a compromise or "trade-off" between the complete unpredictability 
of white noise (very rough "landscape") and the much smoother landscape of Brownian 
noise |I5[| . 
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FIG. 1. (a) Data analyzed: The S&P 500 index Z(t) for the 13-year period 3 Jan 1984 - 31 
Dec 1996 at sampling intervals At = 1 min. These data include the data set analyzed by Mantegna 
and Stanley |l3| and the extension of 7 extra years. Note the large fluctuations, such as that on 
19 Oct 1987 ("black Monday"). The index Z(t) has an increasing trend except for some crashes, 
such as the crashes in Oct. 1987 and May 1990. For the period studied the index can apparently 
be fit by a straight line on a semi-log graph, i.e., exponential growth with annual increase rate of 
~ 15%. (b) Amplitude of fluctuations, |G(t)| (see text for definition), with At = lmin. 
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FIG. 2. Volatility Vr(t) with T = 1 month (8190 min) and sampling time interval At = 30min 
of the S&P 500 index for the entire 13-year period 1984-96. The highlighted block shows possible 
"precursors" of the Oct. '87 crash. 
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FIG. 3. (a) Probability distribution of the volatility on a log-log scale with different time 
windows T with At = 30 min. The center part of the distribution shows a quadratic behavior on 
the log-log scale. The asymptotic behavior seems consistent with a power-law. (b) Center of the 
distribution: The volatility distribution for different window sizes T using the log-normal scaling 
form, • v /i/exp(a + v/A)P(Vt) as a function of (ln(Vr) — a)/ '^/ttV, where a and v are the mean and 
the width on a logarithmic scale. The scaled distributions are shown for the region shown by the 
box in (a). By the scaling, all curves collapse to the log- normal form with a = and v = 1, 
exp(— (lnx) 2 ) (solid line). 
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FIG. 4. Comparison of the log-normal and Gaussian fits for the volatility distribution for 



T = 300 min and At = 30 min. 
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FIG. 5. (a)The cumulative distribution function of the volatility, scaled by the standard 
deviation, for time scales T = 32, 64, 128 min with sampling time interval A = 1 min, using 
non-overlapping windows for the S&P 500 stock index. Regression lines yield estimates of the 
exponent ji = 3.10 ± 0.08 for T = 32 min, fi = 3.19 ± 0.10 for T = 64 min and fi = 3.30 ± 0.15 for 
T = 128 min. The fits were performed over the range of scaled volatility greater than 1 standard 
deviation. Choices of A from 16 min and 32 min were also studied for the same values of T shown. 
Results obtained for these cases and the values of n obtained are consistent with the present case. 
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FIG. 6. (a) The cumulative probability distribution on a log-log scale of the normalized 
volatility for all the 500 individual companies for various averaging window lengths, with a sampling 
time At = 5 min. Power law regression fits yield ji = 3.10 ± 0.11 for T = 10 min, fi = 3.16 ± 0.15 
for T = 20 min, fj, = 3.28 ± 0.17 for T = 40 min, and fi = 3.38 ± 0.18 for T = 80 min. These fits 
were performed in the region of scaled volatility between 1 and 30 standard deviations, (b) The 
probability density function of the normalized volatility for single companies. Regression fits yield 
a slope of 1+n = 4.06±0.10 for T = 10 min, 1 + p = 4.15±0.13 for T = 20 min, l+/x = 4.22±0.15 
for T = 40 min, and 1 + fi = 4.38 ± 0.16 for T = 80 min. The fits were performed in the region of 
scaled volatility between 1 and 50 standard deviations. 
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FIG. 7. The 1-minute interval intra-day pattern for absolute price changes of the S&P 500 
stock index (1984-96) (shifted) and for the absolute price changes, averaged for the chosen 500 
companies (1994-95). The length of the day is 390 minutes. In order to avoid the detection 
of spurious correlations, this daily pattern is removed. Otherwise one finds peaks in the power 
spectrum at the frequencies of 1 /day and larger. Note that both the curves have a similar pattern 
with large values within the first 15 minutes after the market opens. 
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FIG. 8. (a) Semi-log plot of the autocorrelation function of g(t), (b) Autocorrelation function 
of \g(t)\ in the double log plot, with sampling time interval At = lmin. The autocorrelation func- 
tion of g(t) decays exponentially to zero within half an hour, C(t) ~ exp(— t/r) with r ~ 4.0 min. 
A power law correlation , C(t) ~ i~ 7 , exists in the \g(t)\ for more than 3 decades. Note that both 
graphs are truncated at the first zero value of C (t) . The solid line in (b) is the fit to the function 
Y^-pj from which we obtain 7 = 0.30 ± 0.08. The horizontal dashed line indicates the noise level. 
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FIG. 9. Plot of (a) the power spectrum S(f) and (b) the detrended fluctuation analysis F(t) of 
the absolute values of detrended increments g(t) with the sampling time interval At = 1 min. The 
lines show the best power law fits (R values are better than 0.99) above and below the crossover 
frequency of f x = (1/570) min -1 in (a) and of the crossover time, t x = 600 min in (b). The 
triangles show the power spectrum and DFA results for the "control", shuffled data (see text for 
details). 
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FIG. 10. (a) The power spectrum for the absolute values of the normalized price changes for 
individual companies, with the sampling time interval At = 5 min. This is obtained by averaging 
the power spectrum <%(/) for all the 500 companies. We contrast this with the power spectrum of 
the S&P 500 for the same 2-year period 1994-95. Similar to the S&P 500, we observe two power 
laws separated by a crossover frequency. Power law regression fits yield exponents (3\ = 0.20 for 
the high frequency region and /?2 = 0.50 for the low frequency region. The crossover occurs at 
approximately 700 min — slightly larger than that found for the S&P 500 index, (b) The average 
DFA results of 5 min sampled \g(t)\ for the single companies, averaged over all 500 companies. It 
is contrasted with the result of the S&P 500 index. There are two regions characterized by power 
laws with exponents a\ = 0.60 for small time lags and ct2 = 0.74 for large time lags. 
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FIG. 11. DFA results of removing top 5% and 10% data points of the \g(t)\ for the S&P 
500 data. The crossover time is approximately 600 min, l,000min, and 10, 000 min for the data 
removing the top 5% and the top 10% respectively. The DFA exponent a\ for the short time scale 
does not change, the power law regression fit gives a\ « 0.66 for all three curves. Regression fits 
for the exponent «2 give 0.91 ± 0.02, 0.91 ± 0.03 and 1.02 ± 0.04 for three cases, respectively. 
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FIG. 12. (a) The S&P 500 index Z(t) for the 13-year period, (b) Results of dragging a window 
of size 1 y down the same data base, one month at a time, and calculating the best fit exponent 
cci (dashed line) and ct2 (full line) for the time intervals t < t x and t > t x respectively, where 
t x = 600 min. 
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FIG. 13. Description of DFA method (see text). 
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